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Conditional Probability 



15.1 Definition 



Suppose that we pick a random person in the world. Everyone has an equal chance 
of being selected. Let A be the event that the person is an MIT student, and let 
B be the event that the person lives in Cambridge. What are the probabilities of 
these events? Intuitively, we're picking a random point in the big ellipse shown in 
Figure 15.1 and asking how likely that point is to fall into region A or B. 



Figure 15.1 Selecting a random person. A is the event that the person is an MIT 
student. B is the even that the person lives in Cambridge. 

The vast majority of people in the world neither live in Cambridge nor are MIT 
students, so events A and B both have low probability. But what about the prob- 
ability that a person is an MIT student, given that the person lives in Cambridge? 
This should be much greater — but what is it exactly? 

What we're asking for is called a conditional probability; that is, the probability 
that one event happens, given that some other event definitely happens. Questions 
about conditional probabilities come up all the time: 

• What is the probability that it will rain this afternoon, given that it is cloudy 
this morning? 



set of MIT 
students - 




set of people 
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set of all people 
in the world 
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Chapter 15 Conditional Probability 

• What is the probability that two rolled dice sum to 10, given that both are 
odd? 

• What is the probability that I'll get four-of-a-kind in Texas No Limit Hold 
'Em Poker, given that I'm initially dealt two queens? 

There is a special notation for conditional probabilities. In general, Pr [^4 | S] 
denotes the probability of event A, given that event B happens. So, in our example, 
Pr [A I S] is the probability that a random person is an MIT student, given that he 
or she is a Cambridge resident. 

How do we compute Pr [A \ i?]? Since we are given that the person lives in 
Cambridge, we can forget about everyone in the world who does not. Thus, all 
outcomes outside event B are irrelevant. So, intuitively, Pr [A | S] should be the 
fraction of Cambridge residents that are also MIT students; that is, the answer 
should be the probability that the person is in set A D B (the darkly shaded region 
in Figure 15.1) divided by the probability that the person is in set B (the lightly 
shaded region). This motivates the definition of conditional probability: 

Definition 15.1.1. 

PrU n B] 
Pr\A B] ::= 1 - - J 

If Pr[S] = 0, then the conditional probability Pr [^4 | fi] is undefined. 

Pure probability is often counterintuitive, but conditional probability is even 
worse ! Conditioning can subtly alter probabilities and produce unexpected results 
in randomized algorithms and computer systems as well as in betting games. Yet, 
the mathematical definition of conditional probability given above is very simple 
and should give you no trouble — provided that you rely on formal reasoning and 
not intuition. The four-step method will also be very helpful as we will see in the 
next examples. 



15.2 Using the Four-Step Method to Determine Conditional 
Probability 

15.2.1 The "Halting Problem" 

The Halting Problem was the first example of a property that could not be tested 
by any program. It was introduced by Alan Turing in his seminal 1936 paper. The 
problem is to determine whether a Turing machine halts on a given . . . yadda yadda 



2 



"mcs-ftl" — 2010/9/8 — 0:40 — page 419 — #425 



15.2. Using the Four-Step Method to Determine Conditional Probability 



yadda . . . more importantly, it was the name of the MIT EECS department's famed 
C-league hockey team. 

In a best-of-three tournament, the Halting Problem wins the first game with prob- 
ability 1/2. In subsequent games, their probability of winning is determined by the 
outcome of the previous game. If the Halting Problem won the previous game, 
then they are invigorated by victory and win the current game with probability 2/3. 
If they lost the previous game, then they are demoralized by defeat and win the 
current game with probability only 1/3. What is the probability that the Halting 
Problem wins the tournament, given that they win the first game? 

This is a question about a conditional probability. Let A be the event that the 
Halting Problem wins the tournament, and let B be the event that they win the first 
game. Our goal is then to determine the conditional probability Pr [A \ B\. 

We can tackle conditional probability questions just like ordinary probability 
problems: using a tree diagram and the four step method. A complete tree diagram 
is shown in Figure 15.2. 



game 1 game 2 game 3 outcome 



WW 
WLW 

WLL 
LWW 

LWL 
LL 



event A: event B: 
win the win 




series 
/ 

/ 



game 1 
/ 

/ 



outcome 
probability 

1/3 
1/18 



1/9 
1/9 

1/18 
1/3 



Figure 15.2 The tree diagram for computing the probability that the "Halting 
Problem" wins two out of three games given that they won the first game. 



Step 1: Find the Sample Space 

Each internal vertex in the tree diagram has two children, one corresponding to 
a win for the Halting Problem (labeled W) and one corresponding to a loss (la- 
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beled L). The complete sample space is: 

S = {WW, WLW, WLL, LWW, LWL, LL}. 
Step 2: Define Events of Interest 

The event that the Halting Problem wins the whole tournament is: 

T = {WW, WLW, LWW). 
And the event that the Halting Problem wins the first game is: 

F = {WW, WLW, WLL}. 

The outcomes in these events are indicated with check marks in the tree diagram in 
Figure 15.2. 

Step 3: Determine Outcome Probabilities 

Next, we must assign a probability to each outcome. We begin by labeling edges 
as specified in the problem statement. Specifically, The Halting Problem has a 1 /2 
chance of winning the first game, so the two edges leaving the root are each as- 
signed probability 1/2. Other edges are labeled 1/3 or 2/3 based on the outcome 
of the preceding game. We then find the probability of each outcome by multi- 
plying all probabilities along the corresponding root-to-leaf path. For example, the 
probability of outcome WLL is: 

1 1 2 _ 1 

2 ' 3 ' 3 ~ 9" 
Step 4: Compute Event Probabilities 

We can now compute the probability that The Halting Problem wins the tourna- 
ment, given that they win the first game: 

r , , PrU n B] 
Pr \A\ B]= 1 - - J 
L 1 J Pr[S] 

Pr[{PFW, WLW}] 
~ Pr[{WW,WLW,WLL}] 

1/3 + 1/18 
~ 1/3+ 1/18 + 1/9 
__ 7 
~ 9" 

We're done! If the Halting Problem wins the first game, then they win the whole 
tournament with probability 7/9. 
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15.2.2 Why Tree Diagrams Work 

We've now settled into a routine of solving probability problems using tree dia- 
grams. But we've left a big question unaddressed: what is the mathematical justifi- 
cation behind those funny little pictures? Why do they work? 

The answer involves conditional probabilities. In fact, the probabilities that 
we've been recording on the edges of tree diagrams are conditional probabilities. 
For example, consider the uppermost path in the tree diagram for the Halting Prob- 
lem, which corresponds to the outcome WW. The first edge is labeled 1/2, which 
is the probability that the Halting Problem wins the first game. The second edge 
is labeled 2/3, which is the probability that the Halting Problem wins the second 
game, given that they won the first — that's a conditional probability! More gener- 
ally, on each edge of a tree diagram, we record the probability that the experiment 
proceeds along that path, given that it reaches the parent vertex. 

So we've been using conditional probabilities all along. But why can we multiply 
edge probabilities to get outcome probabilities? For example, we concluded that: 

1 2 1 

Why is this correct? 

The answer goes back to Definition 15. 1 . 1 of conditional probability which could 
be written in a form called the Product Rule for probabilities: 

Rule (Product Rule for 2 Events). IfPr[Ei] ^ 0, then: 

Pr[£i n E 2 ] = Pr[£i] • Pr [E 2 \ E x ] . 

Multiplying edge probabilities in a tree diagram amounts to evaluating the right 
side of this equation. For example: 

Pr[win first game n win second game] 

= Pr[win first game] • Pr [win second game | win first game] 

1 2 
~ 2 ' 3" 

So the Product Rule is the formal justification for multiplying edge probabilities to 
get outcome probabilities! Of course to justify multiplying edge probabilities along 
longer paths, we need a Product Rule for n events. 

Rule (Product Rule for n Events). 

Pr[£i n E 2 n . . . n E n ] = Pr[E{\ ■ Pr [E 2 | E{\ ■ Pr [E 3 | £ifl£ 2 ]- 

■Pr[E n | E x n£ 2 n...n£„_i] 
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provided that 

Pr[£i n E 2 n ••• n E n -\] + o. 

This rule follows from the definition of conditional probability and induction 
on n. 

15.2.3 Medical Testing 

There is an unpleasant condition called BO suffered by 10% of the population. 
There are no prior symptoms; victims just suddenly start to stink. Fortunately, 
there is a test for latent BO before things start to smell. The test is not perfect, 
however: 

• If you have the condition, there is a 10% chance that the test will say you do 
not. These are called "false negatives". 

• If you do not have the condition, there is a 30% chance that the test will say 
you do. These are "false positives". 

Suppose a random person is tested for latent BO. If the test is positive, then what 
is the probability that the person has the condition? 

Step 1: Find the Sample Space 

The sample space is found with the tree diagram in Figure 15.3. 

person test result outcome event A: event B: event 
has BO probability has BO tests APlB 

positive 




Figure 15.3 The tree diagram for the BO problem. 



6 



"mcs-ftl" — 2010/9/8 — 0:40 — page 423 — #429 



15.2. Using the Four-Step Method to Determine Conditional Probability 

Step 2: Define Events of Interest 

Let A be the event that the person has BO. Let B be the event that the test was 
positive. The outcomes in each event are marked in the tree diagram. We want 
to find Pr [A \ 6], the probability that a person has BO, given that the test was 
positive. 

Step 3: Find Outcome Probabilities 

First, we assign probabilities to edges. These probabilities are drawn directly from 
the problem statement. By the Product Rule, the probability of an outcome is the 
product of the probabilities on the corresponding root-to-leaf path. All probabilities 
are shown in Figure 15.3. 

Step 4: Compute Event Probabilities 

From Definition 15.1.1, we have 

■-,,„-, PrL4 n B] 0.09 1 

PrU Si = 1 - - J = = -. 

L 1 J Pr[B] 0.09 + 0.27 4 

So, if you test positive, then there is only a 25% chance that you have the condition! 

This answer is initially surprising, but makes sense on reflection. There are two 
ways you could test positive. First, it could be that you have the condition and the 
test is correct. Second, it could be that you are healthy and the test is incorrect. The 
problem is that almost everyone is healthy; therefore, most of the positive results 
arise from incorrect tests of healthy people! 

We can also compute the probability that the test is correct for a random person. 
This event consists of two outcomes. The person could have the condition and 
test positive (probability 0.09), or the person could be healthy and test negative 
(probability 0.63). Therefore, the test is correct with probability 0.09 + 0.63 = 
0.72. This is a relief; the test is correct almost three-quarters of the time. 

But wait! There is a simple way to make the test correct 90% of the time: always 
return a negative result! This "test" gives the right answer for all healthy people 
and the wrong answer only for the 10% that actually have the condition. So a better 
strategy by this measure is to completely ignore the test result! 

There is a similar paradox in weather forecasting. During winter, almost all days 
in Boston are wet and overcast. Predicting miserable weather every day may be 
more accurate than really trying to get it right! 
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15.3 A Posteriori Probabilities 

If you think about it too much, the medical testing problem we just considered 
could start to trouble you. The concern would be that by the time you take the test, 
you either have the BO condition or you don't — you just don't know which it is. 
So you may wonder if a statement like "If you tested positive, then you have the 
condition with probability 25%" makes sense. 

In fact, such a statement does make sense. It means that 25% of the people who 
test positive actually have the condition. It is true that any particular person has it 
or they don't, but a randomly selected person among those who test positive will 
have the condition with probability 25%. 

Anyway, if the medical testing example bothers you, you will definitely be wor- 
ried by the following examples, which go even further down this path. 

15.3.1 The "Halting Problem," in Reverse 

Suppose that we turn the hockey question around: what is the probability that the 
Halting Problem won their first game, given that they won the series? 

This seems like an absurd question! After all, if the Halting Problem won the 
series, then the winner of the first game has already been determined. Therefore, 
who won the first game is a question of fact, not a question of probability. However, 
our mathematical theory of probability contains no notion of one event preceding 
another — there is no notion of time at all. Therefore, from a mathematical perspec- 
tive, this is a perfectly valid question. And this is also a meaningful question from 
a practical perspective. Suppose that you're told that the Halting Problem won the 
series, but not told the results of individual games. Then, from your perspective, it 
makes perfect sense to wonder how likely it is that The Halting Problem won the 
first game. 

A conditional probability Pr [B \ A] is called a posteriori if event B precedes 
event A in time. Here are some other examples of a posteriori probabilities: 

• The probability it was cloudy this morning, given that it rained in the after- 
noon. 

• The probability that I was initially dealt two queens in Texas No Limit Hold 
'Em poker, given that I eventually got four-of-a-kind. 

Mathematically, a posteriori probabilities are no different from ordinary probabil- 
ities; the distinction is only at a higher, philosophical level. Our only reason for 
drawing attention to them is to say, "Don't let them rattle you." 



8 



"mcs-ftl" — 2010/9/8 — 0:40 — page 425 — #431 



1 5.3. A Posteriori Probabilities 



Let's return to the original problem. The probability that the Halting Problem 
won their first game, given that they won the series is Pr [5 | ^4]. We can com- 
pute this using the definition of conditional probability and the tree diagram in 
Figure 15.2: 

Pr r 5 I a\ = n 4 = 1/3 + 1/18 = 7 
L 1 J PrL4] 1/3 + 1/18+1/9 9' 

This answer is suspicious! In the preceding section, we showed that Pr [A | S] 
was also 7/9. Could it be true that Pr [^4 | fi] = Pr [B \ A] in general? Some 
reflection suggests this is unlikely. For example, the probability that I feel uneasy, 
given that I was abducted by aliens, is pretty large. But the probability that I was 
abducted by aliens, given that I feel uneasy, is rather small. 

Let's work out the general conditions under which Pr[^ | 5] = Pr [S | A~j. 
By the definition of conditional probability, this equation holds if an only if: 

Pr[A n B] _ Pr[A n B] 
Pr[S] ~ Pr[^] 

This equation, in turn, holds only if the denominators are equal or the numerator 
is 0; namely if 

Pr[5] = Pr[^] or Pr[v4 n B] = 0. 

The former condition holds in the hockey example; the probability that the Halting 
Problem wins the series (event A) is equal to the probability that it wins the first 
game (event B) since both probabilities are 1/2. 

In general, such pairs of probabilities are related by Bayes' Rule: 

Theorem 15.3.1 (Bayes' Rule). 7fPr[^] and Pr[fi] are nonzero, then: 

-, Pr \A I Si • PHS] 

= -SMij — (15 - 1) 

Proof. When Pr[^] and Pr[B] are nonzero, we have 

Pr [A | B] ■ Pr[B] = Pr[A n B] = Pr [B \ A] ■ Pr[A] 
by definition of conditional probability. Dividing by Pr[^4] gives (15.1). ■ 
Next, let's look at a problem that even bothers us. 



9 



"mcs-ftl" 



2010/9/8 



0:40 



page 



426 



#432 



Chapter 15 Conditional Probability 



15.3.2 A Coin Problem 

Suppose that someone hands you either a fair coin or a trick coin with heads on 
both sides. You flip the coin 100 times and see heads every time. What can you say 
about the probability that you flipped the fair coin? Remarkably, nothing! 

In order to make sense out of this outrageous claim, let's formalize the problem. 
The sample space is worked out in the tree diagram shown in Figure 15.4. We 
do not know the probability p that you were handed the fair coin initially — you 
were just given one coin or the other. Let A be the event that you were handed the 



com given 
to you 



result of 



probability event A: event B: 



fair coin 




trick coin 



p/2 



100 



p-p/2 



100 



given fair flipped all 
coin 

/ 



heads 
/ 



all heads 



l-p 



Figure 15.4 The tree diagram for the coin-flipping problem. 



fair coin, and let B be the event that you flipped 100 straight heads. We're look- 
ing for Pr [A | S], the probability that you were handed the fair coin, given that 
you flipped 100 heads. The outcome probabilities are worked out in Figure 15.4. 
Plugging the results into the definition of conditional probability gives: 

PrL4 n B] 



Pr [A | B] = 



Pr[B] 

p/2 100 
l-p + p/2 100 
P 

2™H\-p) + p 



This expression is very small for moderate values of p because of the 2 100 term 



in the denominator. For example, if p 
given the fair coin is essentially zero. 



1/2, then the probability that you were 
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But we do not know the probability p that you were given the fair coin. And 
perhaps the value of p is not moderate; in fact, maybe p = 1 — 2 -100 . Then there 
is nearly an even chance that you have the fair coin, given that you flipped 100 
heads. In fact, maybe you were handed the fair coin with probability p = 1. Then 
the probability that you were given the fair coin is, well, 1 ! 

Of course, it is extremely unlikely that you would flip 100 straight heads, but in 
this case, that is a given from the assumption of the conditional probability. And so 
if you really did see 100 straight heads, it would be very tempting to also assume 
that p is not close to 1 and hence that you are very likely to have flipped the trick 
coin. 

We will encounter a very similar issue when we look at methods for estimation 
by sampling in Section 17.5.5. 



15.4 Conditional Identities 



15.4.1 The Law of Total Probability 

Breaking a probability calculation into cases simplifies many problems. The idea 
is to calculate the probability of an event A by splitting into two cases based on 
whether or not another event E occurs. That is, calculate the probability of A D E 
and AD E. By the Sum Rule, the sum of these probabilities equals Pr[A]. Express- 
ing the intersection probabilities as conditional probabilities yields: 

Rule 15.4.1 (Law of Total Probability, single event). IfPr[E] andPr[E] are nonzero, 
then 

Vr[A] = Pr [A | E]- Pr[E] + Pr [A \ E] ■ Pr[E]. 

For example, suppose we conduct the following experiment. First, we flip a fair 
coin. If heads comes up, then we roll one die and take the result. If tails comes up, 
then we roll two dice and take the sum of the two results. What is the probability 
that this process yields a 2? Let E be the event that the coin comes up heads, 
and let A be the event that we get a 2 overall. Assuming that the coin is fair, 
Pr[E] = Pr[E] = 1/2. There are now two cases. If we flip heads, then we roll 
a 2 on a single die with probability Pr [^4 | is] = 1/6. On the other hand, if we 
flip tails, then we get a sum of 2 on two dice with probability Pr [A | is] = 1/36. 
Therefore, the probability that the whole process yields a 2 is 

p^I. 1 + 1.1 = 1. 

1 J 2 6 2 36 72 
There is also a form of the rule to handle more than two cases. 
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Rule 15.4.2 (Law of Total Probability). If E\ , . . . , E n are disjoint events whose 
union is the whole sample space, then: 



PrL4] = J2 Pr l A I E i]-^{Ei\. 



i = l 

15.4.2 Conditioning on a Single Event 

The probability rules that we derived in Chapter 14 extend to probabilities condi- 
tioned on the same event. For example, the Inclusion-Exclusion formula for two 
sets holds when all probabilities are conditioned on an event C : 

Pr [A U B | C] = Pr [A \ C] + Pr [B \ C]—Pr[ADB \ C] . 

This follows from the fact that if Pr[C] ^ 0, then 

r , , Pl[(A U B) n CI 

PriUfi C = — — 

L 1 J Pr[C] 

_ Yr[(A n C) U (B n C)] 
" Pr]q 

Pr[^ n C] + Pr[S n C] - Pr[^ n B n C] 
" IMC] 
= Pr [v4 | C] + Pr [B \ C] - Pr [A n S | C] . 

It is important not to mix up events before and after the conditioning bar. For 
example, the following is not a valid identity: 

False Claim. 

Pr [A | B U C] = Pr [A \ B] + Pr [A \ C]-Pr[A\ SflC], (15.2) 

A counterexample is shown in Figure 15.5. In this case, Pr [^4 | 6] = 1/2, 
Pr[^ | C] = 1/2, Pr [A \ B n C] = 1, and Pr[i| SUC] = 1/3. However, 
since 1/3 # 1/2 + 1/2 - 1, Equation 15.2 does not hold. 

So you're convinced that this equation is false in general, right? Let's see if you 
really believe that. 

15.4.3 Discrimination Lawsuit 

Several years ago there was a sex discrimination lawsuit against a famous uni- 
versity. A female math professor was denied tenure, allegedly because she was 
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B 


















































A 




C 



sample space 



Figure 15.5 A counterexample to Equation 15.2. Event A is the gray rectangle, 
event B is the rectangle with vertical stripes, and event C is the rectangle with 
horizontal stripes. B D C lies entirely within A while B — C and C — B are entirely 
outside of A. 

a woman. She argued that in every one of the university's 22 departments, the 
percentage of male applicants accepted was greater than the percentage of female 
applicants accepted. This sounds very suspicious! 

However, the university's lawyers argued that across the university as a whole, 
the percentage of male applicants accepted was actually lower than the percentage 
of female applicants accepted. This suggests that if there was any sex discrimi- 
nation, then it was against men! Surely, at least one party in the dispute must be 
lying. 

Let's simplify the problem and express both arguments in terms of conditional 
probabilities. To simplify matters, suppose that there are only two departments, EE 
and CS, and consider the experiment where we pick a random applicant. Define 
the following events: 

• Let A be the event that the applicant is accepted. 

• Let Fee the event that the applicant is a female applying to EE. 

• Let Fes the event that the applicant is a female applying to CS. 

• Let Mee the event that the applicant is a male applying to EE. 

• Let Mcs the event that the applicant is a male applying to CS. 

Assume that all applicants are either male or female, and that no applicant applied 
to both departments. That is, the events Fee, Fcs, Mee, and Mcs are all dis- 
joint. 
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CS females accepted, 1 applied 0% 

50 males accepted, 100 applied 50% 
EE 70 females accepted, 100 applied 70% 

1 male accepted, 1 applied 100% 

Overall 70 females accepted, 101 applied ss 70% 

51 males accepted, 101 applied « 51% 

Table 15.1 A scenario where females are less likely to be admitted than males in 
each department, but more likely to be admitted overall. 



In these terms, the plaintiff is making the following argument: 

Pr[^ | Fee] <Pr[A \ M E e] and 
Pr[A | Fes] <Pr [A\ M CS ]. 

That is, in both departments, the probability that a woman is accepted for tenure is 
less than the probability that a man is accepted. The university retorts that overall, 
a woman applicant is more likely to be accepted than a man; namely that 

Pr [A | Fee U F CS ] >Pr[A\ M EE U M CS ] ■ 

It is easy to believe that these two positions are contradictory. In fact, we might 
even try to prove this by adding the plaintiff's two inequalities and then arguing as 
follows: 

Pr[A | F EE ]+Pr[A \ F CS ] <Pr[A\ M EE ] +Pr[A \ M CS ] 
Pr [A | Fee U F CS ] <Pr[A\ M EE U M CS ] ■ 

The second line exactly contradicts the university's position! But there is a big 
problem with this argument; the second inequality follows from the first only if we 
accept the false identity (15.2). This argument is bogus! Maybe the two parties do 
not hold contradictory positions after all! 

In fact, Table 15.1 shows a set of application statistics for which the assertions of 
both the plaintiff and the university hold. In this case, a higher percentage of males 
were accepted in both departments, but overall a higher percentage of females were 
accepted! Bizarre! 
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